huggingface (Hugging Face)

merve

posted an update about 9 hours ago

Post

155

New GUI model by Salesforce AI & Uni HK: Jedi
tianbaoxiexxx/Jedi xlangai/Jedi-7B-1080p 🤗
Based on Qwen2.5-VL with Apache 2.0 license

prompt with below screenshot → select "find more"

evijit

updated a dataset 2 days ago

huggingface/policy-docs

Viewer • Updated 2 days ago • 20 • 1.31k • 11

merve

posted an update 2 days ago

Post

1759

HOT: MiMo-VL new 7B vision LMs by Xiaomi surpassing gpt-4o (Mar), competitive in GUI agentic + reasoning tasks ❤️‍🔥 XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

not only that, but also MIT license & usable with transformers 🔥

lhoestq

updated a dataset 2 days ago

huggingface/documentation-images

Viewer • Updated 2 days ago • 52 • 3.06M • 64

coyotte508

in huggingface/HuggingDiscussions 2 days ago

[FEEDBACK] Notifications

🤗 ❤️ 15

148

#6 opened almost 3 years ago by

victor

AdinaY

posted an update 3 days ago

Post

373

MiMo-VL 🔥 smol & mighty vision language model by Xiaomi

XiaomiMiMo/mimo-vl-68382ccacc7c2875500cd212

✨ 7B with RL & SFT
✨ Native resolution ViT for fine grained perception
✨ MORL = smarter alignment across perception, grounding & reasoning

julien-c

in huggingface/inference-playground 3 days ago

[FEEDBACK] Inference Playground

👍 🔥 6

39

#1 opened 8 months ago by

victor

merve

posted an update 3 days ago

Post

2578

introducing: VLM vibe eval 🪭 visionLMsftw/VLMVibeEval

vision LMs are saturated over benchmarks, so we built vibe eval 💬

> compare different models with refreshed in-the-wild examples in different categories 🤠
> submit your favorite model for eval
no numbers -- just vibes!

fdaudens

posted an update 4 days ago

Post

2763

🎵 Dream come true for content creators! TIGER AI can extract voice, effects & music from ANY audio file 🤯
This lightweight model uses frequency band-split technology to separate speech like magic. Kudos to @fffiloni for the amazing demo! fffiloni/TIGER-audio-extraction

AdinaY

posted an update 4 days ago

Post

2557

🔥 New benchmark & dataset for Subject-to-Video generation

OPENS2V-NEXUS by Pekin University

✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match

2 replies

·

AdinaY

posted an update 5 days ago

Post

2176

HunyuanVideo-Avatar 🔥 another image to video model byTencent Hunyuan

tencent/HunyuanVideo-Avatar

✨Emotion-controlled, high-dynamic avatar videos
✨Multi-character support with separate audio control
✨Works with any style: cartoon, 3D, real face, while keeping identity consistent

merve

posted an update 5 days ago

Post

2472

emerging trend: models that can understand image + text and generate image + text

don't miss out ⤵️
> MMaDA: single 8B diffusion model aligned with CoT (reasoning!) + UniGRPO Gen-Verse/MMaDA
> BAGEL: 7B MoT model based on Qwen2.5, SigLIP-so-400M, Flux VAE ByteDance-Seed/BAGEL
both by ByteDance! 😱

I keep track of all any input → any output models here https://huggingface.co/collections/merve/any-to-any-models-6822042ee8eb7fb5e38f9b62

1 reply

·

AdinaY

posted an update 5 days ago

Post

1849

HunyuanPortrait 🔥 video model by Tencent Hunyuan team.

HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation (2503.18860)
tencent/HunyuanPortrait

✨Portrait animation from just one image + a video prompt
✨Diffusion-based, implicit motion control
✨Superior temporal consistency & detail

albertvillanova

posted an update 5 days ago

Post

318

New in smolagents v1.17.0:
- Structured generation in CodeAgent 🧱
- Streamable HTTP MCP support 🌐
- Agent.run() returns rich RunResult 📦

Smarter agents, smoother workflows.
Try it now: https://github.com/huggingface/smolagents/releases/tag/v1.17.0

fdaudens

posted an update 6 days ago

Post

3711

Just completed the AI Agents course and wow, that capstone project really makes you understand how to build agents that can handle real-world complexity!

The final project uses the GAIA dataset - your agent has to solve tasks like analyzing Excel files, processing audio recordings, answering questions about YouTube videos, and diving into research papers. This isn't toy examples, it's the messy, multimodal stuff agents need to handle in practice.

Whether you’re just getting started with agents or want to go deeper with tools like LangChain, LlamaIndex, and SmolAgents, this course has tons of useful stuff. A few key insights:
- Code agents are incredibly versatile once you get the architecture right
- The sweet spot is finding the right balance of guidance vs autonomy for each use case
- Once the logic clicks, the possibilities really are endless - it's like letting LLMs break free from the chatbox

The course is free and the certification deadline is July 1st, 2025.

The Hugging Face team built something special here. If you're tired of AI that impresses in demos but fails in practice, this is your path to building agents that actually deliver. https://huggingface.co/learn/agents-course/unit0/introduction

Best part? There's the MCP course next!

AdinaY

posted an update 6 days ago

Post

2775

Orsta 🔥 vision language models trained with V-Triune, a unified reinforcement learning system by MiniMax AI

One-RL-to-See-Them-All/one-rl-to-see-them-all-6833d27abce23898b2f9815a

✨ 7B & 32B with MIT license
✨ Masters 8 visual tasks: math, science QA, charts, puzzles, object detection, grounding, OCR, and counting
✨ Uses Dynamic IoU rewards for better visual understanding
✨Strong performance in visual reasoning and perception

AdinaY

posted an update 6 days ago

Post

2041

QwenLong-L1🔥 long-context reasoning model by Alibaba Tongyi Zhiwen team.

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning (2505.17667)
Tongyi-Zhiwen/QwenLong-L1-32B

✨ 32B & Apache 2.0
✨ Outperforms OpenAI-o3-mini & Qwen3-235B-A22B
✨ Trained on a unique 1.6K DocQA RL dataset spanning math, logic & multi-hop reasoning

m-ric

posted an update 6 days ago

Post

2405

A new research paper from KAIST builds on smolagents to push boundaries of distillation 🥳
➡️ "Distilling LLM Agent into Small Models with Retrieval and Code Tools" teaches that, when trying to distil reasoning capability from a strong LLM ("teacher") into a smaller one ("student"), it's much better to use Agent traces than CoT traces.

Advantages are:
1. Improved generalization
Intuitively, this is because your agent can encounter more "surprising" results by interacting with its environment : for example, a web research called by the LLM teacher in agent mode can bring results that the LLM teacher would not have generated in CoT.

2. Reduce hallucinations
The trace won't hallucinate tool call outputs!

Thank you @akseljoonas for mentioning this paper!

merve

posted an update 6 days ago

Post

3066

what happened in open AI past week? so many vision LM & omni releases 🔥 merve/releases-23-may-68343cb970bbc359f9b5fb05

multimodal 💬🖼️
> new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) 🌚
> ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM 🐬 (OS)
> Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance

> MMaDa is a new 8B diffusion language model that can generate image and text

LLMs
> Mistral released Devstral, a 24B coding assistant (OS) 👩🏻‍💻
> Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS)
> NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model
> sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS)
> samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)

image generation 🎨
> MTVCrafter is a new human motion animation generator

1 reply

·

fdaudens

posted an update 8 days ago

Post

2490

Two lines in your terminal and you have an AI agent running whatever model and tools you want 🤯

Just tried the new Tiny Agents in Python. Asked it which team won the Italian Serie A soccer league and to export the final table to CSV. Coolest thing is you can interact with the agent, guide it, and correct its mistakes.

The agent connected to web browsing tools, searched for Serie A standings, identified the champion, and generated a CSV export.

The setup:

pip install "huggingface_hub[mcp]>=0.32.0"
tiny-agents run

That's it. The MCP protocol handles all the tool integrations automatically - no custom APIs to write, no complex setups. Want file system access? It's already there. Need web browsing? Built in.

You can swap models, change inference providers, run local models, or add new tools just by editing a simple JSON config. You can also use Gradio Spaces as MCP servers! The entire agent is ~70 lines of Python - essentially a while loop that streams responses and executes tools. Everything is open-source. ❤️ Hugging Face

Blog post: https://huggingface.co/blog/python-tiny-agents

1 reply

·

Hugging Face

AI & ML interests

Recent Activity

Articles

Yay! Organizations can now publish blog Articles

huggingface's activity

huggingface/policy-docs

huggingface/documentation-images

[FEEDBACK] Notifications

[FEEDBACK] Inference Playground

AI & ML interests

Recent Activity

Articles

Yay! Organizations can now publish blog Articles

Team members 217

huggingface's activity

[FEEDBACK] Notifications

[FEEDBACK] Inference Playground